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SOME  APPLICATpStlS  OF 

THE  THEORY  OF  DYNAMIC  PHDGRAMMINO — A  REVIEW 


Richard  /Bellman 


§  1 .  Introduction  / 

In  this  expository  pap<^,  dedicated  to  an  introduction  to 
and  an  Illustration  of  the  te'^^nlques  of  the  theory  of  dynamic 
programming,  we  shall  conslder~fwo  problems  of  rather  simple  fonn<^'/> 
Problem  (1/  ( Optimal  Allocation) . 

We  are  given  a  resource,  x,  to  divide  into  two  parts,  y 
and  X— y.  From  y  we  obtain  a  return  of  g(y);  from  x— y  a  return 
of  h(x— y) .  In  so  doing,  we  expend  a  certain  amount  of  the  ori¬ 
ginal  quantity  and  are  left  with  a  new  quantity,  ay  +  b(x— y), 

TTie.  f  VC  li  -ft, 

where  0  <  a,  b  <  1.  This  process  is  now  continued.  How'  duu’3'  une 


allocate  at  each  stage  so  as  to  maximize  the  total  return  obtained 


over  a  finite  or  unbounded  number  of  stages^ 

,-Rroblga ( 2)  (Efficient  Gold  Mining). 

We  are  fortunate  enough  to  possess  two  gold  mines,  Anaconda 
and  Bonanza,  the  first  of  which  contains  an  amount  x  of  gold, 
while  the  second  possesses  an  amount  y.  In  addition,  we  have  a 
rather  delicate  gold-mining  machine  which  has  the  property  that 
If  used  to  mine  gold  In  Anaconda,  there  is  a  probaollity  P;^  that 

.V  -  i 

It  will  mine  a  fraction  r^  of  the  gold  there  and  remain  in 


What  sequence  of  choices  maximizes  the  amount  of  gold  mined 
before  the  machine  is  damaged? 

Insofar  as  these  problems  Involve  multi-stage  processes, 
large  numbers  of  variables  (when  formulated  in  classical  terms), 
chance  events  (In  the  second  case),  and  the  determination  of 
policies  rather  than  functions,  they  typify  a  very  large  set  of 
Important  and  difficult  problems  which  have  arisen  in  recent 
years  to  plague  the  economist,  industrialist,  strategist,  and 
through  them,  the  mathematician. 

The  methods  we  shall  employ  to  treat  the  above  questions 
constitute  a  part  of  the  theory  of  dynamic  programming,  a  mathe¬ 
matical  theory  which  has  been  created  over  the  last  few  years 
specifically  to  meet  the  challenge  of  these  problems.  Applica¬ 


tions  of  the  theory  have  already  been  made  to  the  theory  of 
Investment  and  allocation,  to  logistics,  to  testing  and  learning 
theory,  to  problems  of  purchasing  and  Inventory,  to  scheduling, 
to  the  planning  of  industrial  and  economic  processes,  and  to  con¬ 
trol  problems  In  engineering  and  economics. 
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§2 .  Optimal  Allocation — Classical  Formulation 

Let  us  now  see  how  Problem  1  above  would  be  attacked, 
employing  conventional  techniques. 

If  there  Is  only  one  stage  to  the  process,  the  total  mmtmm 
return  Is 

(2.1)  Ri(x,y)  -  g(y)  +  h(x-y). 

The  problem  of  maximizing  Ri(x,y)  over  y  In  which 

may  be  solved  readily  by  means  of  calculus,  or  graphically. 

If  there  are  two  stages,  let  yi  be  the  choice  In  the  first 
step  and  ya  the  choice  at  the  second;  then 

(2.2)  R2(x,yi,y2)  -  g(yi)  +  h(xi~y, )  +  g(y2)  +  h(x2-y2), 


where 


(2.3)  xi  -  X,  Xa  -  ay,  +  b(x,-yi), 

and  y,  and  yz  are  constrained  by 

(2.4)  0^yi<x,,  0^y2^X2. 

Quite  generally.  If  there  are  N  stages,  the  total  return 
due  to  successive  allocations  of  y i  ,ya  ,  *  ' '  .y^^  will  be 
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(2.5)  Rf^(xi  ,yi  ,72  ,  •  • -yj^)  -  g(yi)  +  h(xi-yi  )  +  g(y2) 

+  h(x2-y2)  +  ■•■g(yj^)  +  h(xj^-yj^), 

where 

Xi  -  X, 

X2  -  ayi  +  b(x,-yi ) 

(2.6)  ; 

-  ^^N-l  + 

and  (y  1  ,y2  » '  *  ■  yj^ )  lies  In  the  region 

(2.7)  0  <  yi  ^  X, , 

0  <  72  ^  X2 

R: 

0  <  yN  ^ 

Even  for  small  n  the  problem  of  determining  the  maximum  of 
Rj^  over  the  region  described  by  the  inequalities  of  (2.7)  Is  a 
problem  of  formidable  proportions,  particularly  since  some  of 
the  extremum  points  may  be  at  endpoints,  thus  rendering  a  direct 
application  of  calculus  Impossible. 
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.  Optimal  Allocation — Functional  Equation  Approach 

The  key  to  a  different  and  more  finiltful  approach  to 
Problem  1  Is  the  petulant  comment  that  the  conventional  approach 
provides  too  much  Information,  far  more  than  the  practical  man 
carrying  out  the  process  needs.  He  does  not  need  the  values  of 
y^;  he  needs  only  the  value  of  yi  ,  given  N  and  x. 

Let  us  then  use  this  observation  to  provide  a  different 
formulation. 

To  begin  with,  let  us  call  any  choice  of  y i  ,y2  . '  ‘  ‘  yjq »  i’or’ 
an  N— stage  process,  a  policy .  and  call  any  policy  which  yields 
maximum  value  of  Rfj(x,yi  ,y2  ,  *  *  •  ,yjg)  an  optimal  policy.  Observing 
that  the  total  return  obtained  using  an  optimal  policy  depends  only 
upon  X,  the  initial  quantity  of  money,  and  N,  the  number  of  stages, 
we  define 


(3<l)  return  obtained  from  an  N— stage  process 

given  an  Initial  amount  x  and  employing  an 
optimal  policy. 

Using  this  notation,  let  us  compute  the  total  return  obtained 
using  an  Initial  division  of  x  Into  y  and  x— y  In  the  first  step 
of  an  N— stage  process.  The  Immediate  return  due  to  the  Initial 
allocation  will  be  g(y)  +  h(x— y),  and  we  will  have  ay  +  b(x— y) 
with  which  to  continue  for  N— 1  remaining  stages.  It  Is  clear 
that  whatever  the  choice  of  y  Initially,  the  remaining  amount, 
ay  +  b(x— y),  will  be  used  optimally  for  the  N— 1  remaining  stages, 
yielding,  therefore,  a  further  return  of  fj^_^  (ay  +  b(x— y)).  Hence, 
the  total  N— stage  return  due  to  an  initial  allocation  of  y  will  be 
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(3*2)  R^(x,y)  -  g(y)  +  h(x-y)  +  f|^_l(ay  +  b(x-y)). 


By  definition. 


(3.3) 


%(x) 


Max  R«(x,y) 
0<y<x 


Max 

0<y<x 


g(y) 


+  h(x-y) 


+  b(x-y 


))_  . 


This  l3  the  basic  functional  equation  for  the  sequence 
fj^j(x).  Its  importance  lies  In  the  fact  that  It  translates  a 
problem  In  policy  space  Into  one  In  the  more  familiar  function 
space . 


^4.  Computational  Techniques 

Let  us  now  see  what  we  have  accomplished  by  converting  the 
problem  from  that  of  maximizing  the  function  of  N  variables  In 
(2.5)  to  that  of  determining  the  sequence  ffN*’')]-  In  the  first 
place,  we  have  presented  ourselves  with  a  nonlinear  sequence 
of  functional  equations  possessing  all  the  difficulties  attend¬ 
ant  upon  nonlinear  equations.  In  return,  however,  we  have 
reduced  the  dimensions  of  the  problem  from  N  to  1  and  thus  con¬ 
siderably  the  analytic  and  computational  aspects  of  the  problem. 

Beginning  with  fi(x),  which  Is  given  by 


(4.1)  fi(x)  -  Max  rg(y)  +  h(x-y)  1  , 

0<y<x  I- 

we  may  compute  fzCx),  f3(x),  and  so  on,  using  (3.3).  In  the 
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course  of  the  computation  of  fj^(x)  we  automatically  compute 
y(^)  “  yjyf(x)>  which  l3  actually  the  essential  Information. 

Conversely,  given  yj^(x)  for  each  N  and  x  we  may  compute 
fj^(x)  recursively.  We  have  then  a  duality  between  the  maximum 
return,  f^(x),  and  the  optimal  policy,  symbolized  by  yj^(x).  A 
knowledge  of  either  enables  the  other  to  be  computed. 

Let  us  now  exploit  this  fact.  Since  the  amount  remaining 
after  each  stage  decreases  geometrically,  it  is  clear  that  for 
large  N  there  will  be  little  difference  between  fj^(x)  and 
fN^l(x),  assuming,  of  course,  that  g(0)  -  h(0)  -  0  and  that 
g  and  h  are  continuous  near  zero.  It  follows  that  for  large  N 
we  may  write 


(4.2)  f(x)  -  f'oo(x)  fjj(x) 


and  replace  the  sequence  of  equations  in  (3*5)  by  the  one  equation 


(^.3) 


f(x) 


Max 

0<y<x 


g(y) 


+  h(x^)  +  f(ay  +  b(x— y 


))]. 


This  equation,  with  the  solution  fixed  by  the  requirement 
that  f(0)  -  0,  may  now  be  solved  by  successive  approximations. 

One  set  of  approximations  is,  of  course,  the  sequence  |^fj^(x)J 
determined  above.  However,  we  may  do  much  better  in  the  follow¬ 
ing  way:  Instead  of  seeking  approximations  in  function  space, 
let  us  look  for  approximations  in  policy  space;  which  is  to 
say.  Instead  of  approximating  to  f(x),  the  maximum  i^eturn,  let  us 
approximate  to  y(x) ,  the  optimal  allocation. 
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In  many  of  these  problems,  experience  will  have  yielded  a 
great  deal  of  Information  concerning  optimal  policies,  and  It 
Is  precisely  In  tW-s  type  of  approximation  that  this  experience 
can  be  put  to  best  use. 

Let  us  Illustrate:  In  solving  (4.3),  we  may  consider  the 
following  possible  policies,  each  of  which  have  some  Intuitive 
basis 


(a)  At  each  stage  let  y  *  0  or  x  depending  upon 
whether  g(x)/(l— a)x  >  h(x)/(l— b)x  or  not 

(b)  Choose  y  so  that 

Let  fQ(x)  -  ^'g(^)  4»e  the  return  calculated  by  recurrence, 
using  one  of  the  other  of  these  policies.  We  may  now  compute 
successive  approximations  by  means  of  the  relation 

(^•5)  fgCy)  +  h(x-y)  +  f„(ay  +  b(x-y) )  1  . 

0<y<x  L_  ^ 

The  Important  point  to  emphasize  Is  that  we  clearly  have 


(4.6)  <  f’t(x)  i  fsix)  •••  . 


Thus  each  approximation  Is  automatically  an  Improvement. 

§5*  Some  Typical  Results. 

Let  us  now  present  some  typical  results  which  may  be  obtained 


concerning  the  nature  of  the  solutions  of  this  new  class  of 
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functlonal  equations.  These  results  are  Important  since  they 
yield  first  approximations  to  the  solutions  f  of  more  complicated 
equations . 

Theorem  1 .  ^  g ( x )  and  h ( x )  are  both  strictly  convex  functions 

of  X,  an  optimal  policy  requires  that  y  »  0  or  x. 

The  situation  where  g  and  h  are  both  concave  is  more  com¬ 
plicated  . 

Theorem  2 .  Let 

(a)  g(0)  -  h(0)  -  0, 

(b)  g'(x),  h'(x)  >  0,  for  x  >  0, 

(c)  g"(x),  h"(x)  <  0,  for  X  ^  0, 

and  consider  the  sequence  of  equations 


fi(x)  -  Max  r  g(y)  +  h(x-y)  I 
0<y<x  L.  J 

^n+1  I  ii(x-y)  +  f'n(ay  +  t)(x-y)  )  1  ,  n-1 ,2,  ••  • 

0<y<x  *-  -J 

For  each  n ,  there  Is  a  unique  y^  -  ^n ^ ^  which  yields  the 
maximum.  ^  b  <  a,  we  have  yi  <  yz  ^  ys  <  • *  *  >  and  the  reverse 
Inequalities  for  b  >  a .  In  particular,  If  y^ ( * )  “  ^ »  for  some  n , 
In  the  case  b  ^  a,  then  y^Cx)  -  x  for  m  ^  n. 

This  result  Is  useful  for  approximation  purposes  since  yi ,yz 
and  even  ys  may  be  determined  by  hand  computation  quite  quickly. 

Even  when  g  and  h  are  convex  and  we  know  that  y  -  0  or  x. 


It  Is  not  easy  to  determine  which  Is  the  correct  y— value.  The 
following  result  la  useful  for  approximation  purposes: 
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Theorem  3*  The  solution  of 


(5-l)  F(x)  *  Max  r  cx^  +  F(ax),  ex^  +  F(bx)~j 
Is  given  by 


(5.2)  y  -  X  for  0  x  <  x^ 

-  0  for  Xq<  X. 

where 

l/(f-d) 

(5.5)  -  [(o/(l-a'’))/(e/(l-b'’))J 

Another  particular  case  where  the  solution  may  be  obtained 

simply  Is  that  where  g  and  h  are  quadratic  In  x. 

Let  us  now  Indicate  briefly  how  Theorem  ,  and  other  result 

concerning  the  solution  of  particular  equations,  may  be  used  to 

obtain  approximate  solutions.  Given  two  functions,  g(x)  and 

h(x),  we  may  obtain  an  approximate  solution  to  equation  (4.3), 

If  we  can  obtain  approximations  to  g(x)  artd  h(x)  by  means  of 

d  f  V 

functions  of  the  type  cx  and  ex  .  Replacing  x  by  e'^ ,  we  see 
that  this  Is  equivalent  to  approximating  to  g(e^)  by  ce'^^,  or  to 
log  g(e^)  by  log  c  +  dy.  Consequently,  to  obtain  our  approxi¬ 
mate  expressions,  we  plot  log  g(e^)  and  log  h(e^)  qua  functions 
of  y,  and  look  for  straight— line  fits  of  the  form  a  +  by.  This 
may  readily  be  done  by  Inspection. 


P-4  90 
-11- 


Havlng  obtained  these  approximations  to  g  and  h,  we  use 
Theorem  3  to  find  the  exact  solution  of  the  approximate  equa¬ 
tion.  This  solution  has  an  associated  policy  which  may  be  used 
as  an  approximate  policy  for  the  original  problem.  This 
approximate  policy.  In  turn,  yields  an  approximate  solution, 
which  we  may  Iterate,  as  above,  to  obtain  monotone  convergence. 

In  Theorems  1,  2,  and  3  discussed  above,  we  have  shown 
how  various  important  properties  of  the  optimal  policy  are  con¬ 
sequences  of  certain  simple  properties  of  stage— by— stage  payoff 
functions.  In  order  to  determine  the  precise  Influence  of  these 
properties  upon  the  degree  of  complication  of  the  solution,  we 
computed  the  solution  of  a  problem  In  which  g  and  h  exhibited  the 
"diminishing  return"  property.  We  took 

-lO/x  -15/x 

(5-4)  g(x)  -  e  ,  h(x)  -  e 

and  a  «•  .8,  b  -  .9,  and  computed  f(x),  the  solution  of  (4.3), 
by  means  of  successive  approximations. 

Below,  we  see  the  curves  for  fi(x),  fzix),  and  f(x).  They 
Illustrate  the  slowness  of  successive  approximations  based  on 
successive  stages,  and  the  necessity  for  using  the  approximate 
techniques  mentioned  above  If  one  wishes  rapid  convergence. 

The  curve  for  y(x)  given  In  Figure  4  Illustrates  the  extreme 
complexity  that  may  be  expected  In  dn  optimal  policy  If  we  Intro¬ 
duce  functions  which  have  points  of  Inflections.  Since  these 
functions  occur  quite  frequently  In  applications,  as  manifesta¬ 
tions  of  the  law  of  diminishing  returns  mentioned  above,  again 
the  Importance  of  approximation  techniques  is  made  clear. 
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B6 .  Gold  Mining 

Let  us  now  consider  the  second  prohlein,  the  one  concerning 
the  gold-mining  machine  of  sensitive  nature.  This  problem  pos¬ 
sesses  an  additional  feature  of  difficulty  due  to  pi*esence  of 
chance  mechanisms. 

A  policy  here  will  consist  of  a  choice  of  A's  and  B's, 
which  Is  to  say,  mining  In  Anaconda  or  In  Bonanza.  However,  any 
such  sequence  such  as 

(6.1)  S  -  AABBBABB- * ♦ , 

must  be  read:  A  first,  then  A  again  if  the  machine  Is  undamaged; 
then  B  Is  the  machine  still  undamaged,  and  so  on. 

If  Initially,  to  avoid  any  conceptual  difficulties  inherent 
In  unbounded  sequences,  we  consider  only  mining  processes  which 
end  automatically  after  N  steps,  regardless  of  whether  the  mach¬ 
ine  Is  damaged  or  not.  It  Is  quite  easy  to  list  all  the  possible 
policies. 

Since  we  are  dealing  with  a  stochastic  process.  It  Is  not 

* 

possible  to  talk  about  the  return  from  a  policy.  We  must  console 
ourselves  with  some  average  of  the  possible  returns.  The  simplest 
such  Is  the  usual  average,  or  expected  value. 

Let  us  then  agree  that  we  are  Interested  In  the  policy  which 
maximizes  the  expected  value  of  the  amount  of  gold  mined  before 
the  machine  Is  damaged.  Corresponding  to  every  policy  such  as 

We  might  note  In  passing  that  thlq/idea  Is  a  very  difficult  one 
to  explain  to  a  neophyte  at  caid  games,  particularly  In  explain— 
Ing  the  theory  of  a  finesse. 
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(6.1) ,  there  will  be  an  expected  return.  To  determine  an  opti¬ 
mal  policy  It  Is  merely  necessary  to  list  all  possible  policies, 
compute  the  expected  returns  and  compare.  Even  If  feasible, 
this  method  Is  clumsy  and  completely  unreveallng  as  to  the  struc¬ 
ture  of  an  optimal  policy. 

§1 .  Functional  Equation  Approach 

In  place  of  the  above  enumeratlve  approach,  let  us  employ 
the  functional  equation  technique  of  §5-  Let  us  also  simplify 
matters  by  going  directly  to  the  unbounded  process.  We  define 

(7.1)  f(x,y)  -  expected  amount  of  gold  mined  before  the 

machine  Is  damaged  when  A  has  x,  B  has  y, 
and  an  optimal  policy  Is  employed. 

Let  us  compute  the  expected  amount  of  gold  mined  If  an  A 
operation  Is  used  first,  a  quantity  we  denote  by  f-(x,y).  The 
total  expected  amount  will  be  pjTiX,  as  a  result  of  the  Initial 
stage,  plus  the  expected  amount  mined  from  the  second  stage  on. 

It  Is  clear  that  an  optimal  policy  will  be  pursued  from  this 
point  on  If  the  machine  survives.  Hence,  the  expected  amount 
obtained  from  the  second  stage  on  will  be  f ( ( 1— ri )x,y ) ,  since 
Anadonda  now  possesses  (l— ri)x  and  Bonanza  still  has  y. 

Thus , 

]• 


(7.2) 


f'a(x»y)  “ 


Pi  rtX  +  f( (1-ri )x,y) 


Similarly, 


3 


I 
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(7.3)  flj(x,y)  -  P2  1^  r2X  +  r(x,(l-r2)y) 


Since  we  wish  to  choose  A  or  B  so  as  to  maximize  the  over¬ 
all  expected  return,  we  have 

(7.4)  f(x,y)  -  Max  ^f^(x,y),  f^(x,y)J, 

which  yields  the  functional  equation. 


(7.5)  ^(x.y)  -  Max 


A:  Pi  []rix+  f  ( (l-ri  )x,y)  □ 

B:  P2  Qrzy  +  f  (x,  (l-re  )y)  I] 


§8 .  The  Solution 

It  may  be  shown,  cf.  [2},  [3],  Ll7]]  ,  that  the  solution  to 
(7.5)  is  given  by 


(a) 

if 

Pir,x 

> 

P8r2y 

,  take  the  A  choice 

1-Pi 

I--P2 

(b) 

if 

Pirix 

< 

Par^y 

,  take  the  B  choice 

1-Pi 

1-P2 

(c) 

if 

PlTiX 

Pzrgy 

,  either  choice  is  optimal 

1-Px 

1— Pp 

Geometrically , 


(8.2) 
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Observe  that  this  type  of  solution  Is  Ideally  suited  to  a 
problem  Involving  chance  effects.  It  tells  what  to  do  next  in 
terms  of  where  one  Is.  Clearly,  If  from  every  position,  the 
next  move  la  determined,  one  can  determine  all  possible  optimal 
sequences.  However,  In  this  case  as  In  so  many  similar  cases, 
the  solution  Is  most  clearly  presented  In  the  above  form. 

For  further  details  concerning  problems  of  this  type,  we 
refer  to  [2],  ,  [9],  [l6]  ,  and  . 

§9-  Discussion  of  the  Solution 

One  of  the  principal  reasons  for  attacking  problems  of  the 
above  type,  which  are  extremely  idealized  and  simplified  versions 
of  problems  occurring  in  applications,  lies  In  the  fondly 
cherished  hope  that  the  pattern  of  the  solution  may  make  Itself 
clear.  Interpreting  the  mathematical  solution  In  t.enns  of  intui¬ 
tive  concepts,  we  may  discover  some  metaphysical  concept  such  as 
a  "principle  of  least  action"  which  we  can  apply  to  problems  of 
more  complicated  type. 

Let  us  see  what  interpretation  we  can  give  to  the  solution 
given  In  (8.1).  The  expression  pirix/(l— pi)  has  as  Its  numera¬ 
tor  PiriX,  the  Immediate  expected  gain  from  an  operation,  while 
Its  denominator  is  (l-Pi),  the  probability  that  the  machine  will 
be  destroyed,  which  Is  to  say,  the  Immediate  expected  loss.  The 
expression  P2r2y/(l— pa)  consists  of  a  similar  ratio. 

Consequently,  both  expressions  are  ratios  of  Immediate 
expected  gain  to  Immediate  expected  loss,  and  the  optimal  policy 
Is  to  choose  at  each  stage  the  operation  which  maximizes  the  ratio. 
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Although  this  policy  is  not  an  optimal  policy  for  all  such 
problems.  It  Is  an  excellent  rule— of-thumb ,  and  one  which  may 
readily  be  applied. 

^10 .  A  General  Description  of  Dynamic  Programming  Problems 

Having  given  some  simple  examples  of  dynamic  programming 
problems,  let  us  now  see  if  we  can.  In  some  general  way,  charac¬ 
terize  these  problems.  They  possess  the  following  common  features 

(a)  Multi-stage  processes  are  Involved. 

(10.1) 

(b)  At  each  stage,  the  state  of  the  process  Is  des¬ 
cribed  by  a  small  number  of  parameters. 

(c)  The  effect  of  a  decision  at  any  stage  Is  to 
transform  this  set  of  parameters  Into  a  similar 
set . 


We  have  purposely  left  the  description  a  bit  vague,  since 
we  feel  that  It  Is  the  spirit  of  the  problem  rather  than  the 
letter  which  is  significant.  A  certain  amount  of  Ingenuity  Is 
always  required  in  attacking  new  questions,  and  no  amount  of 
axlomatlcs  and  rigid  prescriptions  can  ever  banish  It. 

Add  to  the  above  the  following  simple 
Principle  of  Optimality;  An  optimal  policy  has  the  property 
that  whatever  the  Initial  state  and  Initial  decision  may  be,  the 
remaining  decisions  must  constitute  an  optimal  policy  with  regard 
to  the  state  resulting  from  the  first  decision,  and  we  have  the 
basic  Ingredients  of  the  theory  of  dynamic  programming.  The  rest 
Is  mathematics  and  experience. 
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511*  Some  Typical  Problems 

To  Illustrate  the  way  problems  of  multi-stage  Ktype  occur 
In  these  fields,  let  us  cite  some  typical  problems: 

1.  A  Scheduling  Problem;  Suppose  we  have  a  number  of 
different  objects  which  must  be  processed  by  a  number  of  machines 
of  different  type.  We  assume  that  each  machine  can  process  only 
one  Item  at  a  time  and  that  the  machines  must  be  used  In  a  fixed 
order.  Given  the  times  required  for  each  machine  to  process 
each  Item,  In  general  different.  In  what  order  should  the 
objects  be  processed  so  as  to  minimize  the  total  time  required 

to  process  the  complete  set  of  Items? 

2.  A  Logistics  Problem;  Over  a  period  of  years.  It  Is 
necessary  to  purchase  a  number  of  different  types  of  equipment 
with  different  Job  performance  ratings,  different  costs,  and 
different  salvage  or  resale  values.  In  order  to  perform  a  num¬ 
ber  of  assigned  tasks.  How  should  money  be  allocated  to  purchase 
the  different  classes  of  equipment  so  as  to  minimize  the  amount 

of  money  required  to  do  a  certain  Job,  or  conversely,  so  as  to 

« 

maximize  the  Job  done  for  a  given  appropriation  of  money? 

3.  A  Smoothing  Problem;  There  Is  a  fluctuating  demand  for 
a  product  which  requires  a  certain  production  force  of  employees 
at  any  given  time.  If  the  actual  number  of  emtsloyees  Is  greater 
than  required,  a  certain  loss  Is  Incurred  due  to  nonproductivity. 
On  the  other  hand,  a  certain  loss  Is  Incurred  whenever  new 
employees  are  hired.  What  production  force  should  be  maintained 
so  as  to  minimize  the  total  loss  over  some  fixed  time  period? 
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4.  An  Optimal  Inventory  Problem;  At  some  Initial  time  we 
have  a  quantity  of  merchandise  in  stock  and  are  given  the  infor¬ 
mation  that  at  the  end  of  one  time  period  we  will  be  required 
to  deliver  a  certain  quantity  of  this  merchandise.  The  precise 
amount  required  is  not  known,  but  a  distribution  curve  for  the 
demand  Is  known.  To  meet  this  demand  we  may  order  more  mer¬ 
chandise  at  a  cost  depending  upon  the  amount  ordered.  if  the 


stoctr;  'S" 


amount  erdejad-  If  the  demand  exceeds  the  amount  in  stock,  a 


penalty  depending  upon  the  deficit  Is  levied  and  the  request  is 
fulfilled  as  far  as  possible. 


Assuming  that  the  situation  repeats  Itself  periodically 


and  that  future  costs  are  discounted  at  a  fixed  rate,  what 
ordering  policy  minimizes  'the  over-all  expected  cost? 


5.  A  Control  Problem;  We  are  given  an  engineering  system 
which  is  ruled  by  a  system  of  differential  or  difference  equa¬ 
tions.  To  maintain  the  system  In  its  desired  state,  it  Is  neces¬ 
sary  to  exert  some  control,  the  mathematical  manifestation  of 
which  Is  a  forcing  term. 

It  Is  desired  to  control  the  system  In  such  a  way  that  the 
total  cost,  which  is  compounded  of  the  cost  of  deviation  from 
the  desired  state,  plus  the  cost  of  control,  is  a  minimum. 


6.  Ec onomio  Inve s tment :  In  managing  a  business  enterprise, 
we  have  our  choice  of  taking  money  out  as  Immediate  profit, 
or  of  reinvesting  the  money  to  enlarge  the  business  and  increase 
future  profit.  What  reinvestment  policy  maximizes  the  total 
profit  derived  over  a  given  time  period? 
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7.  Bottleneck  Problems;  Suppose  that  we  have  a  complex 
of  industries,  as  for  example,  steel,  tool,  and  auto,  all 
employed  in  the  production  of  one  particular  Item,  such  as  autos. 
At  any  particular  time  we  have  our  choice  of  allocating  resources 
such  as  money,  steel,  and  tools,  to  produce  steel,  tools,  or 
autos,  or  to  build  steel  factories,  tool  factories,  or  auto 
factories . 

What  allocation  policy  maximizes  the  total  number  of  autos 
produced  over  a  given  time  period? 

8.  Learning  Theory;  Suppose  that  we  have  two  hundred 
critically  111  patients  and  two  new  wonder  drugs  as  yet  untested. 
How  should  these  drugs  be  tested  on  the  patients  so  as  to  maxi¬ 
mize  the  expected  number  of  patients  who  are  cured? 

9.  Testing  Theory;  Suppose  we  are  testing  a  group  of 
objects  for  a  specific  property  and  are  given  the  probability, 
for  each  object,  that  the  test  will  disclose  this  property  If  It 
exists,  and  the  prior  probability  that  each  object  has  this 
property.  What  testing  procedure  will  minimize  the  expected  time 
required  to  determine  a  given  number  of  objects  with  the  required 
property? 


For  those  Interested  In  the  mathematical  treatment 
problems,  we  cite  the  following  references: 
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